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Abstract 

We derive properties of Latent Variable Models for networks, a broad class of 
models that includes the widely-used Latent Position Models. These include the 
average degree distribution, clustering coefficient, average path length and degree 
correlations. We introduce the Gaussian Latent Position Model, and derive an¬ 
alytic expressions and asymptotic approximations for its network properties. We 
pay particular attention to one special case, the Gaussian Latent Position Models 
with Random Effects, and show that it can represent the heavy-tailed degree distri¬ 
butions, positive asymptotic clustering coefficients and small-world behaviours that 
are often observed in social networks. Several real and simulated examples illustrate 
the ability of the models to capture important features of observed networks. 

Keywords: Fitness models, Latent Position Models, Latent Variable Models, So¬ 
cial networks, Random graphs. 


1 Introduction 


Networks are tools for representing relations between ent ities. 


networks, such as acquaintance networks (Amaral et al. 12000), co 


(Newman l200l[) and interaction networks (Perry and W olfe 


Examples include social 


laboration networks 


20131) . technological net¬ 


works such as the World Wide Web ( Albert et al. llOOOl) . and biological networ ks suc h 
as neural networks (Watts and Strogatz 119981) . food web s (Wi lliams and Martinez 120001) . 
and protein-protein interaction networks (Raftery et al. l2012f) . 


world behaviours (Watts and Strogatz 


19981) ■ 


Social networks, specificall y, tend to exhibit transitivity iNewman l2003al) . clustering, 
homophily (Newman and Park l2003l) . t he sca le-free property iNewman l2002b|) and small- 
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Networks are typically modelled in terms of random graphs. The set of nodes is 
hxed, and a probability distribution is dehned over the space of all possible sets of edges, 
thereby considering the observed network as a realisation of a random variable. 

One way to study networks is to dehne a simple generative mechanism that c apture s 
some important basi c prop erties, such as the degree distribution (Newma n et a h l200l[) . 
clustering (Newman l2009h . or small-world behaviour (Watts and Strogatz 119981) . These 
models are deliberately made simple so to be easily htted and studied. Theoretical 
tractability can allow the asymptotic properties of the htted models to be assessed, and 
this can give help to determine how well the models might ht real large networks. It can 
also allow the relationships between statistics measuring clu stering , power-l aw beh aviour 
and small-wo rld b ehaviour to be assessed (Kiss and Green 
and Strogatz 1998 1. 


2008 


Newman 


2009 


Watts 


On the other hand, various statistical mode 
tial Rando m Gr aph Models (Frank and Strauss 
Handcock 
et ah 
et ah 


1986 


201 


20141) ■ Latent Stocha stic Blockmodels (Nowicki and Snijders 


Airoldi et ah 


s hav e been proposed, inclu ding Exponen- 
; Gaimo and Friel l201ll: Kri vitskv and 

Latouche 
Raftery 


2001 


2008), and Latent Position Models (Hoff et ah 


2002 


20121) . These try to capture all the main features of observed networks within a uni- 


hed framework. However, due to their more complicated struct ure, o nly limited research 


has b een carried out to ass ess th eir properties (Daudin et al. 


2012 


2008 


; Ambroise and Matias l2012l: M ariada ssou and Matias 1201511. M oreover, recent de 


velopment s (Gh atterjee and Diaconis 
Handcock 


2013 


Shalizi and Rinaldo 


Ghannarond et al. 


2013 


; Schweinberger and 


20151) have shed light on some important limitations of ERGMs, questioning 


their suitability as statistical models for networks. 

In this paper, we attempt to £11 this gap by deriving theoretical properties of a wide 
family of network models, which we call Latent Variable Models (LVMs). This family 
includes one well-known class of statistical netwo rk models as a sp ecial case, namely the 


Laten t Position Models (LPM) (Hoff et al. 2002 : Handcock et al. 2007 : Krivitsky et al. 


20091 ). These are defined by associating an observed latent position in Euclidean space 


with each node, and postulating that nodes that are closer are more likely to be linked, 
with the probability of connection depending on the distance, typically through a logistic 
regression model. In the last decade, LPMs and their extensions have been widel y use d 


for applications such as the analysis o 
trophic food webs (Ghiu and Westve 
and education research (Sweet et al. 


internatio nal investment (Gao and Ward 


2011 


20131 


20141) . signal processing (Wang et al. 


2014 ). 


2oi4 . 


Analytic expressions for the clustering properties of this model in its original form are 
hard to derive. Because of this, we propose a new but closely related model, the Gaussian 
Latent Position Model. This yields simple analytic expressions or asymptotic approxima- 
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tions for several important clustering properties, including a complete characterisation of 
the degree distribution, the clustering coefficient, and the distribution of path lengths. 
The availability of analytic expressions facilitates the analysis of very large graphs since, 
for example, simulation is not required. 

One result is that the Gaussian LPM can represent transitivity asymptotically, be¬ 
cause its clustering coefficient can be asymptotically non-zero, unlike the Erdds-Renyi 
and Exponential Random Graph Models, whose clustering coefficient converges to zero. 

One implication of our results is that the Latent Position Model in its original form 
cannot represent heavy-tailed degree distributions, such as power-law behaviour, or small- 
world behaviour, as measured by the average path length. As a result, we introduce the 
Gaussian Latent Position Model with Random Effects (LPMRE), and show that it can 
overcome these limitations and capture important features of large-size real networks. 
These results suggest that the Gaussian LPMRE may be a good model for social networks. 

The paper is organised as follows. In Section |2]the notation is set and the main models 
of interest are dehned. Section [3] gives the core theoretical results used in the paper. 
Section 0] makes use of such results to further analyse important features of LPMs, such 
as transitivity, homophily, scale-free properties and small-world behaviours. In Section [5l 
the appealing properties of Gaussian LPMREs are illustrated through empirical studies 
and examples. Section [6] provides several real data studies, while Section [7] concludes the 
paper with some hnal remarks. 

2 Latent Variable Network Models 

2.1 Notation and model assumptions 

Here we introduce our notation and dehne the various latent variable models for networks 
that we consider. 

Al. Q = {V,E) is a binary random graph where V is the set of node labels and E is 
the set of random edges. The observed data consist of a realisation of Q. We denote 
V = {!,...,n} and represent the observed edges through the adjacency matrix Y = 



where: 



1, if an edge from i to j appears in the graph, 
0, otherwise. 


( 2 , 1 ) 


Furthermore we assume that edges are undirected and self-edges are not allowed, i.e. 
yij = yji, V(i, j) E V := ■ 1 < i < j < nj and ya = 0, Vi E V, respectively. Our 
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analysis can easily be extended to the case of directed edges, however. 


A Latent Variable Model (LVM) for networks is dehned by associating an unobserved 
random variable Zi E Z to actor i, Mi G V, for some discrete or continuous set Z. The set 
of quantities P = {zi ,..., Zn} denotes a realisation of the corresponding random process. 

A2. The latent variables Zi, ... ,Zn are independent and identically distributed, where 
each Z is distributed according to the probability measure p{ ■ )■ 


A3. Edges are assumed to be conditionally independent given the latent variables. Thus 
M{i,j) G V, Yij is a Bernoulli random variable such that 

Pr {Yij = l\zi, Zj) = 1- Pr {Yij = 0|zj, Zj) = r (z*, zj) . (2.2) 


The modelling assumptions A1-A3 are very general, and in fact various mod els of 
interest satisfy these, including the Rando m Connectio n Mo dels of Meester i IQQbl) . the 
Fitnes s models of Calda relli e t ah f|2nn2ll : Soderberg (j200^, the LPMs of Hoff et al. 
(|S3); Handcock et al. J^OOtI): Krivitsky et al. J2009l) . and the Stochastic Blockmodel 
of Nowicki and Snijders (120011) . among others. We now give more specihc modelling 
assumptions that characterise Latent Position Models. 


A4. In the LPM, the realised latent variables Zj in A2 are points in the Euclidean 
space M'^, for a hxed d, and they are normally distributed: 

n n 

p {p\i) =n 

i=l i=l 

In (12.3p . 7 is a positive real parameter and fd{ ■ ; M, 7 ) is the multivariate Gaussian den¬ 
sity function with parameters fj, (mean) and 7 !^ (covariance), where is the dxd identity 
matrix and A* denotes the transpose of the matrix or vector A. 



A5. In our specihcation of the LPM, the Gaussian LPM, the Bernoulli parameters in 
A3 are given by: 


r {zi,Zj) = rexp 



(2.4) 


where ip > 0, t E [ 0 , 1 ]. 
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As sumpt ion A5 is slightly different from the original formulation of the LPM of Hoff 
et al. ( 120021 ) . in that the logistic connection function for the edges has been replaced by 
a non-normalised Gaussian density. The reasoning behind this choice will be addressed 
in Section 12721 


A 6 . 

by: 


In the Logistic LPM of Hoff et al. (120021) . the Bernoulli parameters in A3 are given 


exp {a — (3d (zj, Zj)} 

1 + exp {a — (3d (zj, Zj)} ’ 


(2.5) 


where a G M, /3 > 0 and d (zj, zj) is the Euclidean distance between the latent positions 
Zj and Zj. 


2.1.1 Extensions of Latent Position Models 


Two major extension s of 


he LPMs of Hoff et al. ( 2002 ) are Handcock et al. ( 2007 ) 


and Krivitsky et al. (l2009l) . In the former, clustering is introduced through a mixture 


distribution on the latent process for nodal positions, while in the latter, nodal random 
effects are introduced to capture degree heterogeneity. In a similar fashion we introduce 
two variations of A4 and A5 to characterise the two cases: 


G 


(Z.; Mi. 7i) 


( 2 , 6 ) 


A7. The latent positions are distributed according to a hnite mixture of Gaussian dis¬ 
tributions, i.e.: 

n 

*=1 L 9=1 

where tt are the mixture weights, n and 7 are the parameters for the components and 
G is the number of groups. The components are all assumed to arise from densities with 
circular contours, but possibly different volumes. 


A 8 . For every node s G H, the latent information Zg is composed of the realisation 
of a random latent position Zg, which is distributed according to p ( ■ ), and a random 
effect (fg. This random effect is independent of Zg and is distributed according to an 
Inverse Gamma distribution with parameters (3o and (3i. Also, the connection probability 
is modihed as follows: 


Pr {Yij = l|zi, Zj, Lfi, ipj, r) = r exp 


1 

2 {ipi + Lpjf 



(2.7) 


We call this the Gaussian Latent Position Model with Random Effects, or Gaussian 
LPMRE. 
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Different combinations of assumptions A1-A8 generate different Latent Variable Mod¬ 
els. The main cases considered in the present paper are summarised in Table [U 

Table 1: Latent Variable Models considered in the present paper. Latent variables are omitted 
from model parameters. 


Notation 

Description 

Assumptions 

Model parameters 

LVM 

Latent Variable Model 

Al A3 

unspecified 

Logistic LPM 

LPM of Hoff et al. 120021 

A1-A4, A6 


Gaussian LPM 

Gaussian connection LPM 

A1-A5 


Gaussian LPCM 

Clustering LPM 

A1-A3, A5, A7 


Gaussian LPMRE 

1-cluster with random effects 

A1-A4, A8 



2.2 Motivation for the Gaussian likelihood assumption 


The Logistic LPM has been widely used in network models. Assumption A5 introduces a 
new function to define the probability of edges, which is proportional to a non-normalised 
Gaussian density. Other variations in the form of the likeli hood function have been 
proposed in the statistical community (Gollini and Murphy 12014 1. but the reasoning 
behind t he Ga ussian f unctio n mainly coni es from the physics literature (Deprez and 


Wiithrich 


2013 


; Penrose 


1991 


Meester 


19961) . The main advantage of using the Gaussian 


function in place of the Logistic function is that it makes it easier to derive theoretical 
properties without much changing the generative process of the networks. 

In the Gaussian function the model parameters r and (p appear. The role of r is to 
control the sparsity in the network, and to allow for the fact that nodes having the same 
latent position might not be connected. 

The parameter p encompasses the core idea of the LPM, relating the probability 
of edges to the distance between latent positions. Indeed, the larger the parameter p 
the more long range edges are supported. Moreover, as p goes to infinity, the model 
degenerates to an Erdds-Renyi random graph with connection probability r. 

Essentially, the difference between the two assumptions reduces to the fact that, as 
a function of the distance between nodes, the slopes of the curves are different (Figure 
EU). Even though an equivalence result is not provable, we argue that the properties of 
the Gaussian LPM are comparable and analogous to those of the Logistic LPM. 


3 Theoretical results 

In this section, we provide several theoretical results about LVMs, describing the distri¬ 
butions of features of networks realised from such models. 
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Logistic iikeiihood 


Gaussian iikeiihood 




Figure 2.1: Comparison between the Logistic and Gaussian connection functions, with r = 7 = 1. 
As a function of the distance between the nodes, the likelihood of a connection in both cases 
reaches its maximum when the distance is null, and decreases to zero as the distance increases. 


3.1 Properties of the degrees 


The degree of an arbitrary actor s is a discrete random variable defined by Dg = 

In this subsection, the properties of the degrees are characterised, describing their mixing 
behaviour and the distribution of the degree of a randomly chosen node, identihed by the 
vector p = (po, • • • where pk = Pr {D = k), \/k = 0,..., n — 1. To study the degree 

distribution of gene ral LV Ms (including LPMs), we propose a framework resembling that 
of Newman et ah (120011) . which relies on the use of Probability Generating Functions 
(PGFs). 

The study will focus on the following quantities: 


Dl: 6 (zg), dehned as the probability that an actor chosen at random is a neighbour 


of a node with latent information Zg. 


D2: The PGF of the degree of a randomly chosen actor, G{x) = Y12=o 


• D3: The factorial moments of the degree of a randomly chosen actor. Note that 
central and non-central moments can be recovered iteratively from factorial mo¬ 
ments. 


• D4: The hrst factorial moment, i.e. the average degree of a random node: k. 

• D5: The values of pk, for every k = 0,... ,n — 1. 

• D6: k{zg), dehned as the average degree of a node with latent information Zg. 
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• D7: knni'Zs), defined as the average degree of the neighbours of a node with latent 
information Zg. 

• D8: knn{k), defined as the average degree of the neighbours of a node with degree 

k. 

The following main result characterises all of the quantities listed under a very general 
LVM: 


Theorem 1. Under assumptions A1 — A3, the following results hold: 


Dl:9{zs) = j p{zj)r{zs,zj)dzj (3.1) 

D2: G{x) = J P i^s) [xO{Zg) + 1 - 0{Zs)f~^ dzg (3.2) 

(3.3) 

D4: k = (n-l) j p (zg) 9{zg)dZg (3.4) 

D5: pk = j p {Zg) ^ 9{Zg)^ [1 - 9{zg)]''~^~^ dZg (3.5) 

D6: k{Zg) = {n — l)9(zg) (3.6) 

D7: knnizg) = 1 + ^ ^ (^i) dzj (3.7) 

D8: knn{k) = ^ ^ [1 “ knn{zj)dzj (3.8) 


The proof of Theorem [T] is provided in Appendix lA.ll 


Rema rk. Equation fl3.8p is a generalisation of a result from Boguna and Pastor-Satorras 
(120031) . where a ge neral f ramework to stu dy the degree correlations for the htness model 
of Caldarelli et ah (2002) and Soderberg (j2002 1 is introduced. 

Remark. Particular instance s of s ome of the results of Th eorem [1] have been a lready 
shown in Olhede and Wolfe ( 2012 1 and Channarond et ah ( 2012 1: Daudin et ah (I2OO8 I 
for Stochastic Block Models and Fitness models, without resorting to PGFs. Theorem [T] 
encompasses those special cases and extends the range of results offered. 


The results presented in Theorem [T] are valid for all LVMs. Essentially, they relate 
the distributional assumptions about the latent variables and the edge probabilities to 
the properties of the degrees of the realised networks. 

We now apply these results to LPMs. The following Corollaries show how the formulas 
involved in D1-D8 simplify under the Gaussian models of Tabled] Proofs are shown in 
Appendices lA.l.ll and IA.1.21 
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Corollary 1. Under the Gaussian LPM, the following quantities have an explicit form: 


Dl: e{Zs) = r 
d^G 




7 + 


exp 


D3: ^(1) = -7-^ - 


dx^ 




(n - r - 1)! I (7 + (fY [{r + 1)7 + (p] 


D4: k = {n — l)r 




27 + 99 


D7: knniZs) = l + k 


n 


_ 9 \ fd{Zs] 0, 


7^+37y+</p^ 

2'y+tp 


(3.9) 

(3.10) 

(3.11) 

(3.12) 


,"-iy + If) 

Note that 0 { ■ ) has an explicit expression, thus evaluation of the quantities in D2, 
D5 and D8 boils down to an approximation of a single integral. 


Corollary 2. Under the Gaussian LPGM, the following results hold: 

d ^ 

Dl: e{zs) = T {2nip)^^Tigfd{zs] + if) 


9=1 


G G 


(3.13) 

(3.14) 


D4: k = {n — l)r (27r99) ^ EE '^g'^hfd iUg Uhl 0^ 7g 'Ih P +) • 

9=1 h=l 

Also, the degree distribution is a continuous mixture of binomial distributions, where the 
mixture weights are themselves distributed as mixtures of Gaussians: 

G 

E f ( \ I 1 r-i /!/'.. \^n—k—l 

'^gfd{Zs', Ugi'yg) 


D7: Pk = 


L9=1 


n — 1 
k 


0{z7}^ [\ - 0{Zs)]^ ^ ^ dZs- 


(3.15) 


Under the Gaussian LPMRE, none of the equations can be written explicitly, since 
the integrals over the random effects cannot be evaluated analytically. However, we will 
make use of the following two quantities, which will be calculated in an approximate 
form; 


9{zs,(ps) = 

GW(1) = 



fd {zj] 0,7) p (+j|/3o, Yi) r (z„ Zj) dpijdzj, 


{n — 1)! 



fd (z,; 0,7) p (99 j|/ 3 o, Yi) 9{zs, g:>sYdZsdp>s, 


(3.16) 

— r — 1)! 

Remark. The advan tage of using the Gauss ian function rathe r than the Logistic function 
of Hoff et al. ( 2002 '): Handcock et ah ( 2007 ): Krivitsky et ah ( 2009 ) is mainly highlighted 
in Gorollary [T] under the Gaussian hypothesis most of the integrals of Equations I3.HI3.8I 
can be evaluated analytically since they become a convolution of two Gaussian densities, 
which is solvable for any d. Also, quantities that do not have an exact expression, such as 
Pk or knn{k), can be efficiently evaluated through numerical methods, since the number 
of integrals to approximate is constant (depending on d, but not on n). 
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Remark. In Gaussian LPMs, a nonidentifiability issue arises between the parameters (p 
and 7, since the factorial moments depend only on their ratio, 92/7. We argue, however, 
that both parameters should be included in our s tudy, to keep the model as close as 
possible to the original LPM of Hoff et ah (1200211 . and to provide a proper basis for 
possible extensions, such as the Gaussian LPGM and the Gaussian LPMRE. 


3.2 Clustering coefficient 


In this section, we take advantage of the Gaussian assumption to study the clustering 
coefficient value for Gaussian LPMs analytically. 

Since there is more than one dehnition for the clustering coefficient ^ we cl arify that 
the one used in this paper is the global clustering coefficient of Newman f)2003a|l . equal to 


three times the number of triangles divided by the number of connected triples of nodes. 
Thanks to the exchangeability of actor labels, this quantity is an unbiased estimator of 
the probability that, given two consecutive edges, the extremities of such 2-steps path 
are connected themselves. 


Proposition 1. Under assumptions A1-A3, the global clustering coefficient C can be 
written as: 


^ ^ !z !z {Zj, Zk) r (zfc, %) r {zj, Zj) dzjdzkdzj 

Iz Iz !zPi^i)P{^k)p{Zj)r {Zi, Zk) r {zk, Zj) dzidzkdzj 


(3.18) 


Under the Gaussian LPM both the numerator and the denominator can be expressed 
analytically, yielding the following result: 


C = T 


f 7 + y^ \' 

V37 + (py 


(3.19) 


A proof of Proposition [T] is provided in Appendix IA.41 We note that the (I3.19p gives 
an exact result for the clustering coefficient of an LPM of any size. This is an interesting 
result and contrasts with many network models, where the clustering coefficient can only 
be recovered asymptotically. Some interesting consequences of fl3.19p will be illustrated 
in Section 14.31 


3.3 Connectivity properties 

The study of the theoretical properties of LPMs can be further extended, characterising 
the connectivity structure of realised networks. To do so, we give the dehnition of a path 
for a random graph, and show a general result about the connection of two nodes in 
Gaussian LPMs, once their latent position is known. 
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Definition 3.3.1 (Path). Under assumptions A1-A3, a fc-step path is a sequence of fc + 1 
distinct nodes {io, U, • • •, 4} such that an edge is present between every two consecutive 
nodes, i.e. 1/11*2 ’ ’ ’ Uik-iiu 


Under the same assumptions, the probability of a fc-step path appearing between two 
nodes with latent information Zj and Zj can be written as: 

Ik{zi,Zj)= ■■■ p(zi)...p(zfc_i)r(zi,zi)r(zi,z2)---r(zfc_i,zj)dzi---dzfc_i. 

Jz Jz 

(3.20) 

For a Gaussian LPM, the integrals on the right-hand side of 03.20^ involve Gaussian 
kernels only, and therefore they can be evaluated exactly. We provide a more explicit 
formula for /^(zj, Zj) in the following Proposition: 


Proposition 2. Under the Gaussian LPM, let Ik{zi, Zj) he defined as in fl3.20p . for any 
k = 1,2,... ,n — 1, Zi and Zj G M'^. Define the following recurrence relations: 


{ hr+l 
^r+l 
CUr+l 


= (27r(p) 2 (^Zi] 0, 

_ Or'Y 

LOr+'f 

_ tOriP+i^r'r+'YV 

UJr+"f 


(h 

, with < ai 


= r (27r(p) 2 
= 1 
= P 


(3.21) 


Then, the following result holds: 


Ik{zi,Zj) = hkfd{zj-akZi-,0,Uk), for k = 1,2,...,n-1. (3.22) 

The proof of Proposition [2] is provided in Appendix IA.21 

Remark. Note that the previous result could be extended by integrating out the latent 
positions z* and Zj as well. However, this is not of interest for the present work. 

The result of Proposition |2] is a useful tool for studying the statistical properties of 
path lengths for Gaussian LPMs, which we develop in Section 14.41 


4 Properties of realised networks 

We now use the results in the previous section to obtain properties of the Gaussian LPM. 

A main drawback of all LPMs is that, given the complete set of latent positions, the 
evaluation of the likelihood for the corresponding realised graph requires the calculation 
of a distance matrix, with a computational and storage cost of O(n^). This cost is the 
main obstacle to inference for large graphs, making estimation impractical for networks 
larger than a few thousands nodes. The issue extends also to the generation of LPMs, 
which is usually performed in two sequential steps: firstly latent positions are sampled, 
and then edges are created with the Gaussian probability. The evaluation of the distance 
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matrix is thus needed in between the two steps. This makes any empirical study of the 
properties of LPMs rather inefficient and limited to relatively small graphs, only. 

By contrast, the results presented in Theorem [1] and related Corollaries involve either 
exact formulas, which have negligible computational cost, or integral approximations 
whose computational cost is independent of n. Hence, the analysis that we propose in 
this Section does not require any inte nsive calculation and can be performed on networks 
of any size. Note that Raftery et ah (1201211 proposed a computational approximation to 
overcome this difficulty, whereas here we provide exact results and analytical approxima¬ 
tions. 


4.1 Characterisation of the degree distribution for LPMs 


Empirical evaluations (Newman 12003111 suggest that typically the proportion of nodes 
with degree greater than k is expected to be proportional to k~°‘, for a positive a which 
can be as small as 2. Networks exhibiting such behaviour are usually referred to as scale- 
free, and the corresponding degree distribution is said to follow a power-law decay. The 
highly connected nod es, den oted hubs, fulhl a crucial role in dehning the structure of the 

t this is a featur e whi ch many network models 
; Newman et ah 


network (Albert et ah l2000l) . and as a resu 


1999 


20011 ). 


aim to capture (Barabasi and Albert 

According to the results of the previous section, the theoretical degree distribution of 
a Gaussian LPM has the form of a continuous mixture of binomials, and can be approxi¬ 
mated efficiently for any network size. Figure ITT] shows approximate degree distributions 
for various choices of model parameters. 



Figure 4^.1: Gaussian Latent Position Model: Approximate degree distribution for different sets 
of model parameters r, 7, (p. 
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While the degree distributions of sparse networks often resemble Poisson distributions, 
denser networks tend to be associated with more left-skewed shapes. However, the theo¬ 
retical degree distribution of LPMs in Figure S]T] resembles a truncated shape, suggesting 
that the model may not successfully represent heavy tails. It should be noted, however, 
that truncated shapes do arise in social networks: data are often collected through sur¬ 
veys, where each actor is asked to specify up to a fixed number of preferences, so that 
the degree distribution will exhibit an artificial truncation at the corresponding value. 
Popular social datase ts have been obtained using such a design, such as Samp son’s monks 
data (Sampson 


196811 and the Adolescent Health data (Handco ck et al. 1200711 . Moreover, 


some important empirical evidence has been shown in Dunbar (1199211 demonstrating the 
existence of a theoretical cognitive limit on the number of stable relationships that so¬ 
cial actors can maintain. Hence both power-laws and non-power-laws behaviours are of 
interest in statistical modelling of networks. 

We now propose a more rigorous analysis of the degree distribution using the disper¬ 
sion and skewness indexes, which can be evaluated through the exact formulas for the 
factorial moments in fid.iup . 


Corollary 3. Under the Gaussian LPM, the dispersion index is given by: 


D = 1 + (n - 2)r 


V7(27 + (p) 


(n — l)r 




27 


(4.1) 


,(7 + (p)(37 + <p). 

The proof is given in Appendix IA.31 

Remark. The calculation of the skewness does not involve any simplihcation, and so it is 
omitted here. 


The dispersion index can be used to assess how dispersed the distribution is when 
compared to a Poisson, which has an index of 1. A value greater than 1 corresponds to an 
overdispersed distribution while a value smaller than 1 corresponds to an underdispersed 
one. The Binomial distribution arising from a hnite Erdds-Renyi random graph has a 
dispersion index smaller than 1, and so it qualifies as underdispersed. 

Corollary [3] allows us to study how the model parameters r, 7 and affect the disper¬ 
sion of the distribution. For d = 2, our results can be summarised as follows: 

• When if = 'y(\/n — 1 — 2), the distribution has dispersion index 1, typical of a 
Poisson distribution. 


• When ip < jiy/n — 1 — 2), the distribution has dispersion index greater than 1, so 
that the distribution is overdispersed. 

• When ip > 7(\/n — 1 — 2), the distribution has dispersion index smaller than 1, 
typical of a Binomial distribution, and so is underdispersed. 
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Note that the characterisation does not depend on r. 

The left panel of Figure shows the dispersion as a function of the model param¬ 
eters. The motivation behind this result is that the Erdds-Renyi random graph model 
is recovered as a special case asymptotically, as ip gets larger. Therefore, as ip increases, 
the model degenerates and the degree distribution becomes binomial and thus underdis¬ 
persed, regardless of how sparse the network is. If p is small enough, namely below the 
threshold, then the model is nondegenerate and produces networks with an overdispersed 
degree distribution. Hence, Gaussian LPMs are able to represent degree heterogeneity, 
since for many choices of the model parameters the degree distribution is overdispersed. 
However, degree heterogeneity does not imply heavy tails or power-law behaviour. 

Dispersion index for LPM Skewness for LPM and Erdos Renyi random graph 




Figure Gaussian Latent Position Model: Left: Dispersion index versus the ratio between 
p and 7 . The vertical line is the threshold corresponding to a Poisson dispersion. For larger 
values of p, the distributions arising are not more dispersed than an Erdds-Renyi random graph, 
asymptotically degenerating to such model as p gets larger. Right: Unless the graph is very 
sparse, the skewness index for Gaussian LPMs (red line) is smaller than the skewness of a 
Erdds-Renyi random graph (blue line) with the same average degree. 

We now analyse the skewness index, which is useful for identifying asymmetries in 
overdispersed distributions. In the case of degree distributions of networks, a negative 
value of the skewness index corresponds to shapes exhibiting a left tail heavier than the 
right one, while a positive value corresponds to the opposite behaviour. As a tool to assess 
the presence of hubs, we expect a scale-free network to have a positive and relatively large 
skewness index. However, as shown in the right panel of Figure 14.21 this scenario does 
not arise in Gaussian LPMs. 

Given that in Erdds-Renyi random graphs pk goes to zero at the rate l/k\ (i.e. power 
laws are not represented), the right panel in Figure Wf2\ shows that, unless the graph is 
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very sparse, Gaussian LPMs exhibit degree distributions that are always more skewed to 
the left than those of the Erdos-Renyi model with the same average degree. Even for very 
sparse networks, the difference is not large enough to justify the presence of a low-order 
power-law tail. 

This shows that Gaussian LPMs cannot capture power-law behaviour. They are able 
to represent degree heterogeneity, but in the sense that degrees will not be concentrated 
around the mean value, but will rather have a nontrivially dispersed distribution between 
0 and a maximum degree value, conhrming the shapes already shown in Figure HTJ 


4.2 Degree correlations 


In the study of networks, one is often interested in the mixing properties of the graph. 
One mixing structure arises when nodes that share common features are more likely to 
be linked. In the context of social networks, this behaviour is called homophily. 

A special case is mixing according to the nodes’ degrees, called degree correlation. For 
example, one might be interested in whether the degrees of two random nearest neigh¬ 
bours are positively or negatively correlated. Positive correlation, or ass ort at i ve mixing 


of the degrees, is a recurring feature in social networks (Newman and Park l2003l: Newman 


2nn2al) . in contrast to many ot her kin ds of networks (World Wide Web, protein interac¬ 
tions, food webs; see Newman fj2nn3b 'l). which typically have negative degree correlation 
or dissortative mixing. 

Here, we illustrate the fact that Gaussian LPMs can represent assortative mixing in 
the degrees, using the results of Theorem [TJ Equation fl3.12j) shows that the Average 
Nearest Neighbours’ Degree (ANND) of an arbitrary node i is an exact function of its 
latent position Zj. The left panel of Figure 14.31 displays this function in terms of the 
distance between Zj and the centre of the latent space. 

It is not surprising that nodes located closer to the centre have highly connected neigh¬ 
bours. Instead, fl3.8p provides a less explicit formula for the ANND index as a function 
of the degree of node i, rather than its distance from the centre. This quantity can be 
efficiently approximated for every degree value. The right panel of Figure 14731 represents 
this case. The average degree of the neighbours of a node of degree k, knn{k), appears 
to be a nondecreasing function of the degree fc, indicating the presence of a ssorta tive 
mixing in the degrees, using the same criterion as Boguna and Pastor-Satorras (l2003h . It 
follows that realised Gaussian LPM networks exhibit assortative mixing of the degrees, 
suggesting them to be well suited for social networks (Newman and Park l2003l) . 
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Figure 4-3: Gaussian Latent Position Model: Left: Average degree of the elosest neighbours of 
a node as a funetion of its distance from the centre. Nodes located in the centre will more likely 
connect to high degree nodes. Right: Average degree of the closest neighbours as a function 
of the degree of a node. The ANND index is elearly a nondecreasing function, verifying that 
Gaussian LPMs exhibit assortative mixing in the degrees of the nodes. 


4.3 Asymptotics for the clustering coefficient 


Transitivity, defined as the propensity of two neighbonrs of a node also to be neighbonrs 
of one another, is nbiqnitons in network analysis. In social networks, the tendency of 
three or more nodes to clnster is a featnre of interest since it has a nontrivial relation 
with the strnctnre of path leng ths, for example impa cting the dynamics of the spread of 


diseases (Newman l2003a, 


2009 


Kiss and Green 


20081 ). 


LPMs captnre transitivity in a very natnral way. Indeed, when two actors have a 
neighbonr in common, it is expected that the three corresponding nodes will be close in 
the latent space, making triangles more likely. This reasoning extends to higher order 
conhgnrations as well. In this section, we show how Proposition [T] provides a more 
objective jnstihcation to this intnition. 

One well-known drawback of the Erdds-Renyi model is that it cannot captnre tran¬ 
sitivity when the network is large. To see this, let p be the connection probability and 
k = p{n — 1) be the expected average degree of the corresponding realised network. We 
focns on the realistic case where the size of the network increases (n tends to inhnity), 
while k remains constant with respect to n. It follows that p mnst tend to zero as n 
increases, as well as C —?• 0 since C = p. Hence, asymptotically, the clnstering coefficient 
for Erdds-Renyi random graphs is zero. 

Even more strnctnred models snch as Exponential Random Graph Models, have been 


16 












shown to degenerate asymptotically to Erdo s-Ren yi random graphs, nnder some nonre- 
strictive conditions (Chatterjee and Diaconis l2013l) . thus losing the ability to represent a 
nontrivial transitivity structure. 

In contrast, Gaussian LPMs can represent transitivity, even asymptotically. To see 
this, first, recall fid.lip . which defines the average degree of a random node in a Gaussian 
LPM. In order to have an asymptotically constant average degree the parameters (p 
and 7 should satisfy: 

2fco7 


(p = 


(n — l)drd — 

In the limit of large n, the corresponding clustering coefficient satisfies: 


C = 


32 


(4,2) 


(4.3) 


Thus the limiting clustering coefficient has a non-zero value that can be as large as 
3“^. This highlights an important difference between the Erdos-Renyi and Exponential 
Random Graph models on one hand, and LPMs on the other, in that the latter are able 
to represent transitivity in large networks. 

Furthermore, the non-null clustering coefficient classihes Gaussian LPMs as highly 
clustered networks. Such models lack the loopless tree structure which simplifies the 
study of component sizes and path lengths. A review of the mai n difficu lties arising when 
dealing with highly clustered models can be found in Newman ((2002bl). 


4.4 Path lengths 


In a well known experiment, Milgram (119671) observed that any two strangers are con¬ 
nected by a chain of intermediate acqua intan ces of length at most six. Later on, similar 
observations were made in Albert et al. f|l999l) about the connectivity of certain portions 
of the Internet, stating that any two web pages are at most 19 clicks away from one 
another. The small-world effect defines this behaviour exactly: given any two connected 
nodes, the shortest path from one node to the other will have an average length which is 
very small when co mpare d to the size of the network n, typically comparable to log(n) 
or smaller (Newman |200l|). The small-world property has motivated research on the con¬ 
nectivity of graphs, relevant to fields such as communication systems, epidemiology and 
optimisation. 

Hence, understanding how a statistical model relates to the small-world property 
is important. For LPMs, not much is known about the diameter and connectivity of 
the realised networ ks. H ere, we use Proposition [2] to apply a procedure similar to that 
of Fronczak et al. f 2004j) . showing how the distribution of the geodesic distances can 
be evaluated in a Gaussian LPM. We also characterise the average path length (APL) 
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for Gaussian LPM networks of any size, giving appropriate insights on the asymptotic 
behaviour of such an index. 


Fronczak et ah (l2004f) focused on the family of fitness models for networks, which 
ds-Re nyi random graphs and the preferential attachment model of Barabasi 


includes Ere 
and Albert 


19991) . These models satisfy assumptions A1-A3, where the latent informa¬ 


tion is coded by a fitness value hj, for every i E V. Then, edge probabilities are given 
by: 

hh- 

(4.4) 


r {hi, hj) = 


where /5 is a suitable constant. The model degenerates to an Erdos-Renyi random graph 
when hi = k for every i, and (3 = k{n — 1). 

Here, we exploit the fact that fitnes s mod els and LPMs both originate from LVMs, 
generalising the work of Fronczak et ah (1200411 to a wider family of models. To study the 
connectivity of the networks and the path lengths’ distribution, we focus on the quantities 
f'fc(zj,Zj), defined as the probability that the shortest path between two nodes located 
in Zj and Zj has length k. We also define rk{zi,Zj) as the probability that a path of 
length k exists between two nodes. In both definitions, and from now on, we condition 
on the fact that the two nodes are connected, i.e. that there exists a finite-length path 
that has the two nodes as extremes. Such an assumption is natural since usually statistics 
of path lengths are defined only for sets of connected nodes. Note that Ik (zj, Zj) differs 
from Tfc (zj, Zj) in that the latter is the probability that there is at least one /c-step path 
between the two nodes. We now describe a way to evaluate ik (zj, Zj) efficiently, as a 
function of the model parameters of a Gaussian LPM. 


A9. The graphs considered are dense enough, such that for every (i, j) G V, if there 
exists a path of length k between nodes i and j, then a path of length t exists between 
the same nodes for every t = k + 1,... ,n — 1. 

Proposition 3. Under the Gaussian LPM and assumption A9, let i and j he any two 
nodes. Then the following two statements are equivalent: 

• The geodesic distance between i and j is less than k. 

• There exists a k-step path between i and j. 

The proof of Proposition [2] relies heavily on A9 and is straightforward. From Propo¬ 
sition [3] it follows that, for any i and j: 

k 

rk {zi, Zj) = '^it (zi, Zj) . (4.5) 

t=i 
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Moreover, since ii (z^, Zj) = ri (z^, zj) = r (z^, z^), the following holds: 


2j) T*/;; ^j) 1 • 


(4^6) 


Hence, we aim to characterise (zj, Zj), thereby deducing the properties of ik (z,, zj). 

Each possible path of length k from i to j can be thought of as a Bernoulli random 
variable, having a success if all the edges involved in the path appear, or not having 
a success if any of those edges fail to appear. For an Erdds-Renyi random graph with 
average degree k = {n — l)p, the parameter of such a random variable is For Gaussian 
LPMs, the success probability is Ik (zj,Zj), which has been characterised in Proposition 

El 


However, we are interested in (zi,Zj), which is the probability of the union of all 
the fc-steps paths from i to j. Unfortunately, these variables are not independent, since 
different paths will have edges in common. We circumvent this issue by pretendi ng tha t 
all such paths are mutually independent, following the reasoning of Fronczak et ah (1200411 . 
This assumption makes sense when k is much smaller than n. In fact, for the purpose of 
the study of shortest path lengths, estimates of (zj, Zj) will be needed only for small 
fcs, since in the general case Ik Zj) will drop to zero very quickly. 

Using the results of Proposition [2] and Lemma 1 of Fronczak et ah (1200411 . we obtain: 


4 (zi,Zj) ^ exp 4fc_i(zj,Zj)} - exp {-n^4(zj,Zj)} . (4.7) 


Equation (14.7p gives a general formula to evaluate the distribution of the geodesic distance 
4 (zj, Zj) for every k « n for dense Gaussian LPM networks. 

In Figure 14.41 a comparison between the empirical and theoretical values obtained 
is shown. The hrst two panels of Figure 14.41 give a representation of how close the 
approximation of the path length distribution can be, for a dense Gaussian LPM network 
and a less dense one. Note that in less dense networks the assumption that k « n is 
less likely to hold because more sparsity will imply longer shortest paths. 

Also, once Ik (zj,Zj) is known for every k, a straightforward evaluation of the APL 
can be obtained by averaging over all possible values of k, z, and zj. The agreement of 
the estimation with the results from an empirical study is shown in the right panel of 
Figure 031 As expected, the estimation is more accurate for graphs with a higher average 
degree. However, the results show that such an index is more tolerant when assumptions 
tend to be violated, possibly because the bias is limited when values are averaged. 

Figure 031 shows that Gaussian LPMs typically have a higher APL than corresponding 
Erdds-Renyi random graphs. In the left panel, the APL is plotted against the average 
degree of the network. It appears that the sparser the network, the more marked the 
difference with Erdds-Renyi random graphs is. Instead, as the network gets denser. 
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Geodesic distances: dense network 


Geodesic distances: sparse network 


Average path iength vs size of network 





Figure 4-4- Geodesic distances and average path lengths for the Gaussian LPM model. Left and 
centre: Comparison between empirical and theoretical values for the distribution of geodesic 
distances. Networks generated are composed of 100 nodes. The left panel corresponds to a more 
dense graph (average degree is approximately 42j while the one in the centre corresponds to a 
more sparse graph (average degree is approximately 1\). Right: Comparison between empirical 
(lines) and theoretical (dots) values of APL. The parameters r and 7 are set to 1. 


Gaussian LPMs tend to behave more and more similarly to Erdos-Renyi random graphs. 
In the right panel of Figure H751 APL values are shown for larger Gaussian LPMs networks. 
In this case the average degree is kept constant, highlighting the asymptotic behaviour 
of the statistic. 

APL values for the corresponding Erdos-Renyi random graphs are also shown in Figure 
14.51 The Gaussian LPM networks typically have a higher APL, which grows faster than 
the logarithm of the size of the network. 

Figure Hr] illustrates a possible reason for this behaviour. The distance from a node to 
the centre of the latent space is plotted versus its geodesic distance to a second node picked 
at random. There is clear heterogeneity, in contrast with the behaviour of Erdds-Renyi 
random graphs. Glearly, when averaging over all the possible positions of the second ran¬ 
domly chosen node, important contributions are given by distant isolated nodes, thereby 
increasing the APL value. 


5 Advantages of random effects models 


In the previous section, we have shown that, although the Gaussian LPM can capture de¬ 
gree heterogeneity, it cannot represent the power-law behaviour of many observed degree 
distributions. In addition, the model has shortcomings in representing the small-world 
behaviour, in that the APL grows faster than t he log of the number of actors. 

In the Logistic LPM context, Krivitsky et al. (1200911 addressed similar issues by adding 
node-specihc random effects to represent different levels of social involvement. Here, we 
propose an extension of the Gaussian LPM (namely the Gaussian LPMRE of Table [1]) 
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Average path length vs size of the network 
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Figure 4-5: Left: APL against the average degree of a 1000 nodes network, compared with the 
corresponding Erdds-Renyi random graph. The two behaviours diverge for sparse graphs, in 
which case Gaussian LPMs exhibit a larger APL. Right: Asymptotic behaviour for the APL is 
shown. Average degree of the network is kept constant while the size n is on the horizontal axis. 
The continuous lines represent the APL value for corresponding Erdds-Renyi random graphs with 
same average degrees. APL is typically higher in LPM, and grows proportionally to a funetion 
which dominates the logarithm. 


following the same reasoning. 

In the Gaussian LPMRE, the connectivity parameter (p becomes node dependent, and 
is a realisation of an Inverse Gamma distribution with parameters Pq and /Si. Essentially, 
an increase in (p will mainly affect how prone the corresponding actor is to creating long- 
range connections, rather than short-range ones. This behaviour is in line with typical 
scenarios in large social networks, where hubs differ from ordinary nodes in that they 
entail connections between distant areas (or c ommunities) of the graph, decreasing the 
average path length (Watts and Strogatz Il998h . 

We can approximate fl3.16p and fl3.17p and characterise the factorial moments of the 
degree of a random node as a function of the model parameters r, 7,/9o,/di, allowing an 
assessment of the extent to which such models can represent heavy tails. Since the value 
of T makes no difference here, we £x it to 1. 

Table |2] shows that the variance of random effects does not have much influence on 
the average degree of the network. This is relevant for studying heavy tails, since sparser 
networks will naturally have a higher skewness index. Hence, if we keep the mean of the 
random effect constant and change the variance, not much of the change in the skewness 
index will be due to the network becoming sparser. 

Figure 15.11 shows that an increase in the variance of the random effects does yield 
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Figure 4-6: Average geodesic distance from a node as a function of its distance from the centre 
of the latent space. The network is composed o/10000 nodes, with r = 1. Clearly, nodes which 
are closer to the centre will be better positioned to reach easily many other nodes, thus having 
a smaller APL index. Such heterogeneity in the connectivity structure characterises Gaussian 
LPMs and separates them from Erdds-Renyi random graphs, justifying the larger values for global 
APL. 


an increase in the skewness index, corresponding to a right-skewed heavy-tailed shape. 
Therefore, these two results indicate that the heaviness of the tails can be controlled 
by changing the variance of the random effects, without changing the average degree of 
the network by much. The smallest skewness index is obtained with a null variance for 
random effects, which corresponds to the Gaussian LPM. 

But how heavy are the tails corresponding to a given positive skewness? Figure 
15.21 shows the empirical degree frequencies obtained through simulations of Gaussian 
LPMREs. The two panels on the left side of Figure E2]show the degree distribution for a 
LPM (on both standard and log-log scale), where the variance of random effects is set to 
a very small value. The right-hand panels are obtained with the same parameters, except 
for the variance of the random effect, which is increased to 10®. The average degrees for 
the two cases are: O.lSln and 0.144n respectively and the skewness indexes are —0.07 
and 2.53 respectively. The log-log scale plots are represented to show that the decay 
switches from a high-order power-law (reasonably comparable to a Poissonian tail) to a 
power-law with an exponent which falls between 2 and 3. 

The results shown conhrm that random effects can extend the family of networks 
represented using LPMs. However, other features of interest are non-trivially influenced. 
Hence, we propose an empirical study to explore how random effects affect the asymptotic 
behaviour of LPMRE with respect to small-world behaviour and transitivity. Simulations 
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Table 2: Average degree of a network of 100 actors for different values of mean and variance of 
the nodal random effects. The variance has very little impact on the overall average degree of the 
network. This is an important property which is needed to state that any increase of skewness is 
not due to the network getting sparser. 


Mean 



Variance 



0.0001 

0.1 

1 

10 

100 

1000 

100000 

0.1 

1.95 

2.88 

2.73 

2.91 

2.85 

2.81 

2.83 

0.2 

7.34 

8.30 

8.25 

8.20 

8.30 

8.21 

8.17 

0.3 

14.97 

15.19 

14.83 

14.35 

14.40 

14.33 

14.38 

0.4 

24.11 

23.28 

21.14 

20.49 

20.73 

20.60 

20.37 


Skewness index 



Figure 5.1: Skewness index versus variance of nodal random effects. An increase in the variance 
of the random effects leads to an increase of the skewness index, corresponding to heavier tails. 


of LPMREs are very inefficient, so the results are rather limited. However, such a proce¬ 
dure is the only feasible one, since theoretical results on the LPMRE are not available. 
In fact, we are currently investigating alternative ways to approach this analysis using 
more rigorous theoretical frameworks. 

In this experiment, we have selected a particular set of model parameters, generated 
a sequence of IID networks and studied the average features exhibited. Since we are 
interested in the asymptotic behaviour of APL and C, we have held the average degree 
approximately constant by imposing 7 oc n, with n increasing. Figure ESI illustrates the 
results. The left panel shows that an increase in the variance of the random effects results 
in a smaller APL. Furthermore, the APL growth as a function of n becomes slower than 
the log function, exhibiting the small-world behaviour. The right panel represents instead 
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Degree distribution on log-log scale 
for a LPM without random effects 



Degree distribution on log-log scale 
for a LPM with random effects 



log-Degree 


log-Degree 


Figure 5.2: Top: degree distributions for Gaussian LPMREs with null-variance random effects 
(left) and large-variance random effects (right). Bottom: corresponding degree distribution on 
the log-log scale. An increase in the variance of the random effects results in a heavier power-law 
tailed degree distribution. The average degrees are: 0.151n and 0.144n for the case on left and 
right respectively, while skewness indexes are —0.07 and 2.53 respectively. 


the empirical asymptotic clustering coefficient. Here, it appears that C tends to stabilise 
to a non-zero limiting value, which clearly depends on the variance of the random effects. 
Such interaction between the presence of hubs and the clustering coefficient could be 
somehow expected, since for an extreme case, the n-nodes star, C is equal to zero. 

Considering the results shown in this Section, random effects can be regarded as a 
useful addition to LPMs to capture several important features that arise in large social 
networks. 
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Figure 5.3: APL (left) and clustering coefficient (right) as a function of n, holding an approx¬ 
imately constant average degree. The remaining model parameters are t = 1, IE[(/?] = 0.6 and 
7 = 0.05(n — 1). The number of networks generated for each value of n is 1000. The dashed 
black lines represent the log function and the asymptotic value for C under the Gaussian LPM 
for the left and right panel respectively. 


6 Real data examples 


We have characterised the models introduced by showing how some important statistics 
of realised networks depend on the parameters of LPMs. We now show that several well 
known real social networks have statistics that can be well captured by a htted LPM, 
using the following datasets: 


Dolphins; This is a social network of frequent associations between 6 2 dolp hins in 


f aoip i 

2nn8li . 


a community living off Doubtful Sound, New Zealand (Lusseau et ah 
Monks: This d escribes the interpersonal relations among 18 monks in a monastery 
tSampson llQbS h 

Florentine; This describes the connecti ons by marriage between 16 noble families 
in Florence during Renaissance f Padgett 119941) . 

Prison; Data collected in the 1950s by John Gagnon from 67 prison inmate s, eac h 
one being asked to specify his preferences among other participants (MacRae 1196011 . 
High-tech: This network contains the friendship ties among 36 employees of a hi- 
tech company, which were gathered b y mea ns of the question; who do you consider 
to be a personal friend? (Krackhardt 1199911 . 

Math method; 38 school superintendents were asked to indicate their friendship 
ties with other superintendents in the county with the following question; among 
the chief school admini strato rs in Allegheny County (PA, USA), who are your three 
best friends? fCarlson 119651) . 
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Sawmill: 36 employees of a sawmill were asked to quantify the time the y spe nt 
discussing work matters with each of their colleagues (Michael and Massev Il997|) . 
San Juan; Study carried out in a rural area in Costa Rica. Edges represent visiting 
frequencies between 75 fa milies living in farms in a neighbourhood called San Juan 
Sur (De Nooy et ah 2011 ). 

Network sciences (1589 nodes): Coaut horshi p network of scientists working on 
network theory and experiment fNewman 120061) . 

Geometry (7343 node s): Co authorship network of scientists working on computa¬ 
tional geometry (Jones 1200211 . 

Condensed Matter (16726 nodes): Coauthorships betwee n scie ntists posting 
preprints on the Condensed Matter E-Print Archive (Newman 1200111 . 

High energy (27770 nodes): Coauthorships between scien tists posting preprints 
on the High-Energy Theory E-Print Archive (Newman l200l|). 


Where necessary, the datasets have been transformed into binary undirected (no self¬ 
edges) graphs, using standard reasonable procedures. 

We can obtain the following network statistics for the Gaussian LPM using Theorem 
[U the average degree A:, the clustering coefficient C, the average path length APT and 
the skewness index S. Table [3] shows their observed and theoretical values for the smaller 
datasets. 

The theoretical values shown in Table |3] correspond to model parameters chosen to 
match the observed with the theoretical k and C. This simple criterion performs well for 
the networks presented, as indicated by Figure IRTl which shows theoretical and observed 
degree distributions. 

A slightly different study was carried out for the larger datasets, to assess to what 
extent the Gaussian LPMRE can represent the asymptotic scale-free decay of the degree 
distribution, for different orders of the power-law. We consider several collaboration 
networks where nodes correspond to authors and two nodes are linked if the corresponding 
scientists published a paper as coauthors. All the networks shown exhibit a power-law 
degree distribution, with different slopes, which vary in the range 1 to 4. Figure [32] shows 
the theoretical and observed degree distributions on the log-log scale, indicating that the 
asymptotic behaviour is reasonably well represented in all the cases. 


7 Conclusions 

The main contribution of this paper is to advance our understanding of Latent Position 
Models for networks by providing several probabilistic results. Our main results describe 
features of realised Latent Position networks, characterising their degree distribution. 
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Table 3: Theoretical and observed statistics for small-sized social networks. Statistics shown are 
the average degree k, the clustering coefficient C, the average path length APT and the skewness 
index S. Following the criterion described, the average degree and the clustering coefficient are 
matched exactly in every case, while the corresponding skewness index and average path length 
are fairly close to the observed counterparts. 


Parameters 

Dolphins (n=62) 

k 

C 

S 

APL 

T 

0.810 

Observed 

5.129 

0.309 

0.292 

3.357 

iph 

0.232 

Theoretical 

5.129 

0.309 

0.461 

3.282 


Parameters 

Monks (n=18) 

k 

C 

S 

APL 

r 

0.763 

Observed 

6.667 

0.465 

0.877 

1.68 


2.115 

Theoretical 

6.667 

0.465 

-0.05 

1.724 


Parameters 

Florentine (n=16) 

k 

C 

S 

APL 

r 

0.302 

Observed 

2.5 

0.191 

0.424 

2.486 


2.460 

Theoretical 

2.5 

0.191 

0.503 

2.827 


Parameters 

Prison (n=67) 

k 

C 

S 

APL 

r 

0.776 

Observed 

4.239 

0.288 

0.855 

3.355 


0.180 

Theoretical 

4.239 

0.288 

0.562 

3.831 


Parameters 

High-tech (n=36) 

k 

C 

S 

APL 

T 

0.913 

Observed 

5.056 

0.372 

0.785 

2.360 

ifh 

0.376 

Theoretical 

5.056 

0.372 

0.376 

2.749 


Parameters 

Math method (n=38) 

k 

C 

S 

APL 

T 

0.616 

Observed 

3.211 

0.246 

0.654 

2.644 

ph 

0.328 

Theoretical 

3.211 

0.246 

0.612 

3.480 


Parameters 

Sawmill (n=36) 

k 

C 

S 

APL 

T 

0.550 

Observed 

3.444 

0.230 

2.290 

3.138 

ph 

0.436 

Theoretical 

3.444 

0.230 

0.558 

3.210 


Parameters 

San Juan (n=75) 

k 

C 

S 

APL 

T 

0.657 

Observed 

4.133 

0.245 

1.622 

3.485 

ifh 

0.186 

Theoretical 

4.133 

0.245 

0.579 

3.883 


the mixing properties of the degrees, the clustering coefficient and the path lengths’ 
distribution. Although this work deals only with undirected graphs, the same results can 
be extended in a similar fashion to directed ones. 

Gaussian LPMs have been shown not to be appropriate for modelling scale-free net¬ 
works, since the average degree frequencies exhibit a left-skewed and truncated shape. 
However, modifying the basic LPM to include nodal random effects resulted in the abil¬ 
ity of the model to represent power-law degree distributions of different slopes in both 
simulated and real networks. 

It has been also shown that Gaussian LPMs have an asymptotically strictly positive 
clustering coefficient, in contrast to other well known models, such as Erdos-Renyi and 
Exponential Random Graph models, whose clustering coefficient is asymptotically zero. 
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Dolphins 


Monks 


Florentine 









Figure 6.1: Comparison between the observed degree distributions (blue bars) and the theoretical 
ones (red lines) for several small-size real social networks. Datasets used (from top left by row): 
Dolphins, Monks, Florentine, Prison, High-tech, Math method. Sawmill, San Juan. 


This result suggests that LPMs can generate highly clustered networks and that they can 
capture the persistent clustering behaviour of large social networks. 

The average degree of the closest neighbours to a node has been characterised, showing 
that positive degree correlations arise in LPM networks. This is in line with observed 
social networks, where assortative mixing in the nodal degrees frequently occurs. 

It has also been shown how the distribution of geodesic distances can be efficiently ap¬ 
proximated, yielding an analysis of the asymptotic behaviour of the average path length. 
It appears that dense LPM networks have the same behaviour of Erdds-Renyi random 
graphs, while sparser LPM networks do not exhibit the small-world effect. 

Through simulations, important advantages of using nodal random effects have been 
outlined, suggesting that the Gaussian LPMRE has properties that makes it suitable for 
modelling large social networks. An important extension of this work would be to develop 
new strategies to study analytically the LPMRE and LPCM. 
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Figure 6.2: Empirical (blue dots) and theoretical (green line) degree distributions on log-log seale 
for various large citation networks. The datasets exhibit different asymptotic power-law orders. 
Gaussian LPMREs reasonably represent the asymptotic tendency of the degree distributions in 
every ease. Datasets used: Network scienees (top left), Geometry (top right), Gondensed matter 
(bottom left), High energy (bottom right). 
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A Appendix: proofs 

A.l Theorem [1] 

Dl. This is straightforward since Vz^ G : 


Q (zj = Pr = l|zs) = J P i^j) ^ Zj) dzj. 


D2. 


n—1 


n—1 


G (x) = ^ X V = 

k=0 k=0 

n—1 „ „ 

^ L ' I P i^i) ■ ■ ■ P i^n) Pr (D, = k\P) dzi ■ ■ ■ dZr, 

k=0 


j=i 

n 

Ylpi^j) 

j=i 


E |P] dzi ■ ■ ■ dzn 

n ^ 

E [x^"^' |P] > dzi ■ ■ ■ dzr, 


~i=i 


Y\p{zj) [xr {zs, Zj) + 1 - r (z^, Zj)] > dzi ■ ■ ■ dz^ 


- j=i 


= / p(zs)< / p(zj) [xr (zs,Zj) + 1 - r (zs,Zj)] dzj 


n-l 


dz. 



= / p(z^)<x / p(zj)r (z^,Zj)dzj + 1 - / p{zj)r{zs,Zj)dZj 


n—1 


(A.l) 


(A.2) 


dz.. 


= / p (zs) [x6'(zj + 1 - ^(z^)]" ^dz^. 


D3. The r-th factorial moment of corresponds to the r-th derivative of G evaluated 
in 1: 


f)^G f r)^ 

-^(x) = / p izs) ^ [x9{z,) + 1 - 9{zs)r~^ dzs 


= J p{zs) {n - 1) ■ ■-{n - r)9 {zsY [x9{zs)+ l - 9{zs)f dz^ (A.3) 
(n — 1)! 


n—r—1 


dz. 


and the final formula evaluated in x = 1 gives fl3.3p . 

D4. The average degree is the first factorial moment, thus: 

k = G'{1) = ^^ [ p (zs) 9 (zs) dzs = {n - 1) [ p (zJ 0 (zJ dz^. 
{n - 2)! Jz Jz 


(A.4) 
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D5. The distribution of the degree of a random node can be recovered by differentiating 
G as well. Indeed, using (IA.3j) . for every k\ 


1 fn-l 


P{7,s)6{7,s) [1 -0{2s)] 


n—k—1 


dz.. 


(A.5) 


D6. Define the PGF for the degree of a random node once its latent information is fixed 
to Zg: 

n—1 

G (x; Zg) = ^ x^Pr {Dg = k\zg) 


/c=0 


n 

i=i 


E [x^^\P] dz_g 


(A.6) 


p (zj) [xr {zg, Zj) + 1 - r (z^, Zj)] dz^ 


n—1 


= {xe{zg) + i-e{zg)r-^-, 

which is simply the PGF of a binomial random variable with parameters n — 1 and 6 {zg). 
Hence its average degree is k (zg) = {n — 1) 9 (zg) . Note that dz^g = Y\j^s 

D7. We now write down the PGF for the degree of a random neighbour of a node 
located in Zg. 

n—1 


k=0 


H (x; z^) = ^ x’^Pr {Dj = k\ygj = 1, Zg) 

0 

n—1 

p {zj \ygj = 1, Zg) ^ x^Pr {Dj = k\ygj = 1, Zg, Zj) dzj 


fc =0 


= j P (zj IVsj = 1, Zs) E [x^^ \ygj = 1, z^, Zj] dzj. 

Note that E = l,Zg,Zj] corresponds to the PGF for the so called excess degree 

(Newman et ah 1200 11 1, i.e. the degree of a node at one extreme of an edge picked at 


random. Hence, such PGF is equal to where G has been defined in flA.611 . Then: 


H{x;Zg) = j p{zj\ygj = l,Zg 


xG"(x; Zj) 

G(l;z,) 


dzi 


Pr iygj = l\zj,Zg)p{zj) 


{x [x9 {zj + 1-9 (zj))]'^ dzj (A.8) 


Jz Pr{ysj = l\zg) 

(zj) r {zj, Zg) {x [x9 {zj + 1-9 (z^))]"”^} dzj. 
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Its average degree is then given by: 

1 


^nn (^s) H ( 1 , 


e (z 


p (zj) r {zj, Zs) {1 + {n-2)9 {zj)] dzj 


s) JZ 


D8. The PGF for the degree of a neighbour of a node with degree k is given by: 

n—1 


(A.9) 


H {x; fc) = ^ x'^Pr {Dj = r\Ds = k, psj = 1) 

r=0 

n—1 „ 

= p{zs\D, = k) Pr {Dj = r\z„y,j = 1) dz, 

„_n d Z 


r=0 
1 

Pk Jz' 


p {zs) Pr {Ds = k\zs) H (x; z^) dzs 


= — Pi^s) 
Pk Jz 

= — [ Pi^s) 
Pk Jz 


Qk 

dx^ 
n — 1 
k 


G (0;z,) 


H {x-,Zs)dzs 


9{zs)'^ [1 - e^Zs)]^ ^ ^ H {X] Zs) dzs] 


and its hrst derivative evaluated in x = 1 yields: 

knn{k) = ~ j ^ ^(Zs)^ [1 - 9 {Zs)]"'~’'~^ knn{ 2 s)dZs. 


A.1.1 Proof for Corollary [T] 

Recall that a convolution of two Gaussian densities is still a Gaussian density: 

/ fd (z*; Ml, 7 i) fd {zj - Zi, Ma, 72) dzi = fa (z^; Mi + Ma, 7 i + 72), 

for every Zj, zj, Mi, Ma every positive real numbers 71 and 7 a. 

That being said: 


Dl. 


X _ 

fd (zj-; 0,7) r ( 27 r(y 9 ) 2 (z^ - z^-; 0, <^9) dz^ 

= r(27rv9)Vrf(z,;0,7 + <y9) 


/ (p y 

= T - exp 

\l + pj 


2(7 + ip) 


(A.IO) 


(ATI) 


(A. 12 ) 


(A.13) 
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D3. 


d^G 

dx^ 


( 1 ) 


(n — 1)! 

(n — r — 1)! 

(n — 1)! 

T 


(n — r — 1)\ \'y + (f 


fd (zs] 0,7) 6{zsYdz, 

r , ^ 



(n-1)! / 

TTT^ 




rd 


{n — r — 1)\ \7 + 9? 



/d(z^;0,7)exp 


r J 


2 (7 + (p) 


X j JdizsiO,^) fd dzs 

rd d 

(^-1)! _r ^ " fo^ (7 + y^) l " fo^ [(^ + l)7 + <^] 

(n —r —1)! 77 + 93/ \ r J \ r 

(n-l)! . f__+ 

(71 - r - 1)! I (7 + 77)’'"^ [(r + 1)7 + 'f]J 



d 

2 


(A.14) 


D4. 


k = G\l) 


{n 


l)r 


+ 1 ^ 
27 + + / 


(A.15) 


D7. 


kn.n f Z o ) 


1 + f Pi^j)r{zs,zj)e{zj)dzj 


= 1 + 


0 (zj 
(n — 2) _2 


6 » (z^ 


-r^ (27r+) X 


= 1 + 


6 (z, 


X / /rf(zj;0,7)/rf(zj;0,7 + +)/rf(zs-Zj;0,+)ciZj 

^ (27r+)'^ {27r (27 + 93)}~2 X 


x/ fd ( Zj-; 0, ^ ^ ] fd {zs - zf, 0, If) dzj 


27 + + 


= 1 + (n - 2)r 
= 1 + 



J/<7(^.;0.7> + ++) 

V27 + +/ 

' /rf(zs;0,7 + +) 


n 


2\ /li ( 0) 


7^+37y>+(p^ 

27+^3 


n-lj /rf(z3;0,7 + +) 


(A.16) 
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A.1.2 Proof for Corollary [2] 

Dl. 

r ^ 

= Y] ^gfd (zj! ^g) ^ (27rv7) = fd (z. - z^-; 0, (p) dzj 

2®'^ 3=1 

= r (27r(y9)2 ^TTg / fd{zj;fig,jg) fd{z,-Zj;0,(p)dzj (A.17) 

9=1 

d ^ 

= T (27r(^) 2 Tlgfd [zs] f^g, 79 + 7^) • 

9=1 

D4. 

/. G ^ G 

k={n-l) Y ^gfd (z.; Ig) r (27rv7) = ^ iThfd (z,; 7h + V?) dzs 

3=1 ft=l 

d G G „ 

= (n — l)r (27r<y9) 2 EE fd{Zs;f^g,7g)fd{Zs;fJ'h,7h + ^)dZs (A.18) 

3=1 h=l dRd 

d G G 

= {n- l)r (27rv7) 2 Y Y '^g'^hfd - M/*; 0 ,7^ + 7/* + v?) • 

3=1 li=l 

While D7 is straightforward from fl3.5p . 

A.2 Proof of Proposition [2] 

First, we recall a few properties of the Gaussian distribution through a Lemma: 

Lemma 1. Let fd{-] fi,^) denote the d-dimensional Gaussian density centred in pL, with 
covariance matrix 7 /^. Let also x,u,vE and a,b,a G M"*". Then: 

fd [x] u, a) fd (x; V, h) = /^ (w - u; 0, a + h) fd (X] ; (A.19) 

fd {ax-, u, a) = a~‘^fd (x, . (A. 20 ) 

V a a^y 

Here follows the proof of Proposition |5] by mathematical induction on k. If k = 1, 
then; 

d 

Ii{zi,Zj) = hifd{zj - aiZi, 0 , 1 x 1 ) = r (27r(p)2 (z^ - Zj;0,(p) = r {zi,Zj). (A.21) 

Now assume that Ik{zi, Zj) = hkfd {zj — OfeZ,; 0, Uk), then we need to prove that 

-^/c+l(Zi, Zj) hk-\-\fd (Zj ak-\-\Zi, 0 , 0Jk-\-\) , 
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where hfc+i, a^+i, are defined recursively by fl3.2ip . 




p(zi) ... p(zfc)r (Zi, Zi) ■ ■ • r (z^, zj) dzi ■ ■ ■ dz^ 


= / p{zk)r {zk,Zj) / ■■■ / p(zi).. .p(zfc_i)x 

X r {zi, zi) ■ ■ ■ r (zfc_i, z^) dzi ■ ■ • dzk (A.22) 

= y p(zA.)r (zfc,zj)4(zi,zfc)dzfc 

= j p(x)r(x, Zj) Ik{zi,x) dx. 

Now, we introduce the Gaussian LPM assumptions and use the results of the Lemma [T] 

Ik+iizi,Zj) = t{2ttp)^ hk / /rf(x;0,7)/d(x-Zj;0,(p)/d(x-afcZi;0,i:Ufc)dx 


= r (27r<p)2 hfcX 

X / /d (x - Zj-; 0, (p) fd (-ttfcZi; 0, (Ufc + 7 ) /d ( x; 


^k'y 


CJk + l' (^k+l 


dx 


= r (27r(p)2 hfctt ‘^fd ( Zj;0, 


+ 7 


ar 


X 


X / fdix- Zj;0,(p) fd ( x; 

hk+lfd I Zj, 


'yoikZi ^kTi 


^k+1 tUfc + 7 

7afcZi Ukl + iOk^ + 


dx 


^k + 1 


^k + 1 


^k+lfd i^'^j Q^/j_|_iZj, O.Cd^_|_i) 


(A.23) 


A.3 Proof of Corollary [3] 

Let G be the PGF of the random variable D, denoting the degree of a node picked at 
random. Then the r-th derivative of G evaluated in 1 is equal to the r-th factorial moment 
of D, denoted here c^: 

c, = ^(l) = E|C(£'-l)"-(r'-r + l)]. (A.24) 

In particular: 


Cl = E [D] = mi 

C 2 = E [D (D — 1)] = E — E [D] = m 2 — mi 

m 2 = Cl + C 2 , 
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(A.25) 

(A.26) 

(A.27) 












where mi and m2 denote the first two non-central moments of D. That being said, using 
Corollary [T] the dispersion index can be evaluated exactly: 


V = 


E [{D — mi)^] m2 


m; 


mi 


mi 


m 2 1 , C2 

-mi = 1 -I-Cl 

mi Cl 


("-!)("- 2) T^{ |,+X37+^) h , ,, 

= 1 +---3-^-(n - 1 ) r 

= 1 + (n - 2 ) r 


.1 




\ 27+</5 

ip (27 + ip) 


(7 + V?) (37 + 93) 


— (n — 1) r 


27 + 

V? 


(A.28) 


2'y + p 


which proves the corollary. Also, when d = 2, the threshold between underdispersion and 
overdispersion is given by: 


(n - 2) (27 + p) _ (n- 1) 

(7 + <d) (37 + (27 + <d) 


(A.29) 


Now, recalling that p > G and 7 > 0, this is equivalent to: 

(n - 2) (27 -1- pY - (n - 1) (7 p) (87 -f- (p) = 0 

-f- 477: -|- 57^ — n'j'^ = 0 (A.30) 

^p = 7 (—2 ± \/n — 1 ) . 


One solution is negative thus not feasible, then the threshold is given by: 


(p = 7 [y/n — 1 — 2) . 


A.4 Proof of Proposition [T] 


Formula in fl3.18p is straightforward since it is obtained by conditioning on the latent 
information. We now show how to obtain the exact formula flO.lOp under the Gaussian 
LPM. We solve the numerator and the denominator Cd independently. 


Cn — 



p{zi)p{zk)p{zj)r (zj, Zfc) r (z^, Zj) dzkdzidzj 


= / p{zk) < / p{zi)r{zi,zk)dzi>< / p{zj)r {zk,Zj)dzj\ dzk 

= / p{zk)9 {zkf dzk 
Jm.d 

G"{1) 


(A.31) 


(n — 1) (n — 2) 
^2 


= T 




(7 + v >) (37 + V?) 
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Now we solve the numerator. 


Cn — 



p{zi)p{zk)p{zj)r {zi, Zfc) r (z^, Zj) r (zj, z^) dzidzkdzj 

I 

= P(z*) / P{^jy{zj,Zi) I p{zk)r{zi,zk)r{zk,zj)dzk\dzjdzi 

jRd, Js_d IjRd J 

= / p(zi) / p{zj)r {zj,Zi) l2{zi,Zj) dzjdzi 

jRd, J^d 

= / p{zi)Is{zi,Zi)dzi 


(A.32) 


where /^(zj, z^) is defined in 13.201 for every fc G Zj G and Zj G 
For more clarity, we define the recurring quantity 

X = + 3793 + 7^. 


(A.33) 


We first discover the quantities needed to write /3(zj,Zj) explicitly: 


{ CTl =1 
wi =(p 
hi = T (27r(p) 2 



_ 7 

■y+ip 

_ ip(2y+ip) 
y+ip 

= (27rv3)'^/rf(zi;0,7 +93) 


(A.34) 


027 7^ 

as —-^— — 7-; 

ci ;2 + 7 A 

_ U 2 ^ + U 2 'y + 'yp _ (p (7 + (p) (37 + p) 

^3 — -^-—- 7 -; 

032 + 7 A 

hs = (27r(p)^'^ fd {zp 0 , 7 + (p) fd 0 , ^ ^ • 

Now, for ha, we use Lemma [Hand join the two Gaussian densities: 


hg = r"* (27r(p)2‘ 


Id / 7 + + 


7 


= (27793)' 


[27 + 93 



1 


Also: 


(l-„3) = +^ 

0^3 _ A (7 + 93 ) 

(l-aa)^ 93(37 + 93 ) 



27 + 937 


(A.35) 

(A.36) 

(A.37) 


(A.38) 


(A.39) 

(A.40) 

(A.41) 


37 


















Then, it follows: 


/3(zj,Zi) =/i3(l-as) "'/d (zi;0, 


(1 - as)' 


^3 




27 + 99 


Z7 0, 


A 


X 


X 


A 


+ (37 + +) 

Collapsing again the Gaussian densities: 

T , . 3 j 27r+^ 'I 2 

h(Zi,Zi)=T 


fd Zi;0, 


27 + 99 

A (7 + +) 

+ (37 + +)y ■ 


2(37 + +) / ’ 2 

We can now obtain the final result for the numerator: 


Cn= p{zi)l3{zi,zi)dzi 


= T 


= T 


27r+^ 


2 (37 + +) 


fd (zp 0, 7 ) fd ( Zi] 0, 1 dzi 


(37 + +)' 

The hnal formula for the clustering coefficient follows: 


3 /_+ ' 2 


Cn ^ (37+ip) 


-.2/ 


= r 


.+_ 1 2 


7 + + 

37 + + 


\ (7++(37+¥5) / 


(A.42) 


(A.43) 


(A.44) 


(AGS) 
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